# 4-bit quantization inference
Deepseek R1 0528 AWQ
MIT
The 4-bit AWQ quantized version of the DeepSeek-R1-0528 671B model, suitable for use on high-end GPU nodes
Large Language Model
Transformers

D
adamo1139
161
2
Gemma 3 12b It 4bit DWQ
A 4-bit quantized version of the Gemma 3 12B model, suitable for the MLX framework and supporting efficient text generation tasks.
Large Language Model
G
mlx-community
554
2
GLM 4 32B 0414.w4a16 Gptq
MIT
This is a model that uses the GPTQ method to perform 4-bit quantization on GLM-4-32B-0414, suitable for consumer-grade hardware.
Large Language Model
Safetensors
G
mratsim
785
2
Google Gemma 2 27b It AWQ
Gemma 2 27B IT is a 4-bit large language model based on AutoAWQ quantization, suitable for dialogue and instruction-following tasks.
Large Language Model
Safetensors
G
mbley
122
2
Qwq 32B Preview AWQ
Apache-2.0
The AWQ 4-bit quantization version of QwQ-32B-Preview significantly reduces memory usage and computational requirements, making it suitable for hardware deployment with limited resources.
Large Language Model
Transformers English

Q
KirillR
2,247
26
Llama 2 7b MedQuAD
Apache-2.0
A medical Q&A model fine-tuned on the MedQuAD dataset based on Llama-2-7b-chat
Large Language Model
L
EdwardYu
27
2
Falcon 7B Instruct GPTQ
Apache-2.0
The 4-bit quantized version of Falcon-7B-Instruct, quantized using the AutoGPTQ tool, suitable for efficient inference in resource-constrained environments.
Large Language Model
Transformers English

F
TheBloke
189
67
Featured Recommended AI Models